Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification
نویسندگان
چکیده
The bag of visual words (BOW) model is an efficient image representation technique for image categorisation and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words which can improve the accuracy of image categorisation tasks. Most approaches that use the BOW model in categorising images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information which is an important characteristic of any natural scene image. In this paper we show that integrating visual vocabularies generated from each image category, improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoints densitybased weighting method, to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset, can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories respectively using 10-fold crossvalidation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.
منابع مشابه
AUTHORS’ CHECK LIST - Title: Fusing Integrated Visual Vocabularies-Based Bag of Visual Words and Weighted Colour Moments on Spatial Pyramid Layout for Natural Scene Image Classification
The bag of visual words (BOW) model is an efficient image representation technique for image categorisation and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words which can improve the accuracy of image categorisation tasks. Most approaches that use the BOW model in categorising images ignore useful infor...
متن کاملRecognizing in the depth: Selective 3D Spatial Pyramid Matching Kernel for object and scene categorization
This paper proposes a novel approach to recognize object and scene categories in depth images. We introduce a Bag of Words (BoW) representation in 3D, the Selective 3D Spatial Pyramid Matching Kernel (3DSPMK). It starts quantizing 3D local descriptors, computed from point clouds, to build a vocabulary of 3D visual words. This codebook is used to build the 3DSPMK, which starts partitioning a wor...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملConsidering the Spatial Layout Information of Bag of Features (BoF) Framework for Image Classification
The spatial pooling method such as spatial pyramid matching (SPM) is very crucial in the bag of features model used in image classification. SPM partitions the image into a set of regular grids and assumes that the spatial layout of all visual words obey the uniform distribution over these regular grids. However, in practice, we consider that different visual words should obey different spatial...
متن کاملSpatial Fisher Vectors for Image Categorization
We introduce an extension of bag-of-words image representations to encode spatial layout. Using the Fisher kernel framework we derive a representation that encodes the spatial mean and the variance of image regions associated with visual words. We extend this representation by using a Gaussian mixture model to encode spatial layout, and show that this model is related to a soft-assign version o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Signal, Image and Video Processing
دوره 7 شماره
صفحات -
تاریخ انتشار 2013